pMT06 was transfected into various cell types and mRNA barcode counts were sequenced together with pMT06 pDNA counts. In four different sequencing runs, data for all P53 & GR reporters was collected, and will be analyzed here.
It looks like the the most variance is between the sequencing runs itself. But this makes sense because I used two different pDNA libraries for the 4 sequencing runs. Some samples correlate with pDNA counts - these samples have pDNA contamination and need to be removed.
Stuff learned from above figures:
Some samples can be excluded from further analysis because they don’t contain useful information, these samples are:
MCF7-KO-DMSO: rep2_seq1, rep1_seq1(?), r2_seq2(?), r1_seq3, r3_seq3
MCF7-KO-Nutlin: rep2_seq1, rep3_seq1(?), r1_seq2, r1_seq3, r3_seq3
MCF7-WT-DMSO: rep2_seq1, rep3_seq1
MCF7-WT-Nutlin: rep3_seq1, r2_seq2, r1_seq3
A549_DMSO: r2_seq3, r3_seq3
A549_Dex10: r2_seq3
A549_Dex100: r2_seq3
A549-Dex-1: r1_seq2, r2_seq3
mES-N2B27-HQ: rep1_seq1
mES-N2B27-RA: rep1_seq1, rep2_seq1
# It might be hard to compare the KO with the WT barcode counts because in th KO we don't really have active elements -> 'inactive' elements will take up most of the reads
## How can we scale the different conditions correctly?
ggplot(bc_df %>%
filter(neg_ctrls == "Yes", str_detect(tf, "53"), str_detect(sample, "MCF")) %>%
dplyr::select(tf, sample, rpm, reporter_id, gcf) %>%
unique(),
aes(x = sample, y = rpm, color = tf)) +
geom_quasirandom(dodge.width = 0.75) +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
facet_wrap(~gcf)bc_df_scale <- bc_df %>%
filter(neg_ctrls == "Yes", str_detect(tf, "53")) %>%
dplyr::select(sample, rpm, reporter_id, gcf) %>%
unique() %>%
group_by(sample) %>%
mutate(rpm = mean(rpm)) %>%
ungroup() %>%
dplyr::select(-reporter_id) %>%
unique() %>%
dplyr::select("background_rpm" = rpm, sample) %>%
unique() %>%
filter(str_detect(sample, "pDNA", negate = T))
bc_df <- merge(bc_df, bc_df_scale, all = T)
bc_df <- bc_df %>%
mutate(rpm_norm = rpm / background_rpm)
ggplot(bc_df %>%
filter(neg_ctrls == "Yes", str_detect(tf, "53"), str_detect(sample, "MCF")) %>%
dplyr::select(tf, sample, rpm_norm, reporter_id, gcf) %>%
unique(),
aes(x = sample, y = rpm_norm, color = tf)) +
geom_quasirandom(dodge.width = 0.75) +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
facet_wrap(~gcf)# looks quite ok, MCF7_WT_DMSO_r1_gcf6412 has some outliers - maybe I need to remove this sampleDivide cDNA barcode counts through pDNA barcode counts
paste("Run time: ",format(Sys.time()-StartTime))## [1] "Run time: 23.01659 mins"
getwd()## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/analyses"
date()## [1] "Wed Jul 21 15:03:27 2021"
sessionInfo()## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 grid parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] PCAtools_2.2.0 ggrepel_0.9.1
## [3] DESeq2_1.30.1 SummarizedExperiment_1.20.0
## [5] Biobase_2.50.0 MatrixGenerics_1.2.1
## [7] matrixStats_0.59.0 GenomicRanges_1.42.0
## [9] GenomeInfoDb_1.26.7 IRanges_2.24.1
## [11] S4Vectors_0.28.1 BiocGenerics_0.36.1
## [13] tidyr_1.1.3 viridis_0.6.1
## [15] viridisLite_0.4.0 ggpointdensity_0.1.0
## [17] ggbiplot_0.55 scales_1.1.1
## [19] factoextra_1.0.7 shiny_1.6.0
## [21] pheatmap_1.0.12 gridExtra_2.3
## [23] RColorBrewer_1.1-2 readr_1.4.0
## [25] haven_2.4.1 ggbeeswarm_0.6.0
## [27] plotly_4.9.4.1 tibble_3.1.2
## [29] dplyr_1.0.7 vwr_0.3.0
## [31] latticeExtra_0.6-29 lattice_0.20-41
## [33] stringdist_0.9.6.3 GGally_2.1.2
## [35] ggpubr_0.4.0 ggplot2_3.3.5
## [37] stringr_1.4.0 plyr_1.8.6
## [39] data.table_1.14.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.2.1
## [3] lazyeval_0.2.2 splines_4.0.5
## [5] crosstalk_1.1.1 BiocParallel_1.24.1
## [7] digest_0.6.27 htmltools_0.5.1.1
## [9] fansi_0.5.0 magrittr_2.0.1
## [11] memoise_2.0.0 openxlsx_4.2.4
## [13] annotate_1.68.0 jpeg_0.1-8.1
## [15] colorspace_2.0-2 blob_1.2.1
## [17] xfun_0.24 crayon_1.4.1
## [19] RCurl_1.98-1.3 jsonlite_1.7.2
## [21] genefilter_1.72.1 survival_3.2-10
## [23] glue_1.4.2 gtable_0.3.0
## [25] zlibbioc_1.36.0 XVector_0.30.0
## [27] DelayedArray_0.16.3 car_3.0-11
## [29] BiocSingular_1.6.0 abind_1.4-5
## [31] DBI_1.1.1 rstatix_0.7.0
## [33] Rcpp_1.0.7 xtable_1.8-4
## [35] dqrng_0.3.0 foreign_0.8-81
## [37] bit_4.0.4 rsvd_1.0.5
## [39] htmlwidgets_1.5.3 httr_1.4.2
## [41] ellipsis_0.3.2 pkgconfig_2.0.3
## [43] reshape_0.8.8 XML_3.99-0.6
## [45] farver_2.1.0 sass_0.4.0
## [47] locfit_1.5-9.4 utf8_1.2.1
## [49] tidyselect_1.1.1 labeling_0.4.2
## [51] rlang_0.4.11 reshape2_1.4.4
## [53] later_1.2.0 AnnotationDbi_1.52.0
## [55] munsell_0.5.0 cellranger_1.1.0
## [57] tools_4.0.5 cachem_1.0.5
## [59] cli_3.0.0 generics_0.1.0
## [61] RSQLite_2.2.7 broom_0.7.8
## [63] evaluate_0.14 fastmap_1.1.0
## [65] yaml_2.2.1 knitr_1.33
## [67] bit64_4.0.5 zip_2.2.0
## [69] purrr_0.3.4 sparseMatrixStats_1.2.1
## [71] mime_0.11 rstudioapi_0.13
## [73] compiler_4.0.5 beeswarm_0.4.0
## [75] curl_4.3.2 png_0.1-7
## [77] ggsignif_0.6.2 geneplotter_1.68.0
## [79] bslib_0.2.5.1 stringi_1.7.2
## [81] highr_0.9 forcats_0.5.1
## [83] Matrix_1.3-2 vctrs_0.3.8
## [85] pillar_1.6.1 lifecycle_1.0.0
## [87] jquerylib_0.1.4 cowplot_1.1.1
## [89] bitops_1.0-7 irlba_2.3.3
## [91] httpuv_1.6.1 R6_2.5.0
## [93] promises_1.2.0.1 rio_0.5.27
## [95] vipor_0.4.5 assertthat_0.2.1
## [97] withr_2.4.2 GenomeInfoDbData_1.2.4
## [99] hms_1.1.0 beachmat_2.6.4
## [101] rmarkdown_2.9 DelayedMatrixStats_1.12.3
## [103] carData_3.0-4